In [1]:
import warnings
warnings.filterwarnings('ignore')

Data Visualization Tutorial

Network Analysis and the NetworkX Library

Motivation

With an increasingly interconnected and data-rich world, networks are an ubiquitous feature of modern life. These are manifested in a wide variety of fields such as social networks, global value chains, disease outbreaks, mobile phone networks, internet browsing, vehicular flows, transportation, and finance, among others. Network analysis may allow us to:

  • determine important actors/pieces within organizations/systems
  • analyze interconnectivity within a network
  • predict a network's future direction
  • identify clusters within a network
  • examine underlying structures of associations through link analysis
  • map out community organizations through social network analysis

For instance, the below image shows political blogs prior to the 2004 US Presidential election, which reveals two densely-knit and well-separated communities. image.png (Source: Easley & Kleinberg, 2010)

Network Analysis

The key elements of a graph network are nodes and edges:

  • Nodes are the entities of interest. These could take the form of people, organizations, concepts (akin to concept mapping), or even something as specific as database tables/relations. Network analysis, including NetworkX, covers a wide gamut of functionalities beyond data visualization and are actually well equipped to handle additional information about the nodes/entities within their networks, which we will also cover in this tutorial.
  • Edges are the relationships between the nodes. They may or may not be directional, may represent different types of links in the same network, and may carry weight. Examples are financial transactions (e.g., remittances or online transfers between account holders), manager-staff relations, or shared keys between database tables.

To illustrate these concepts, a simple network graph as well as sample node and edge data tables follow: image.png (Adapted from Professor Taylor Corbett's Data Visualization Lecture 9 for the Fall 2021 Semester)

NetworkX

NetworkX is a Python package for creating, manipulating, visualizing, and studying the structure, dynamics, and functions of complex networks. It can handle graphs with up to 10 million rows and around 100 million edges.

NetworkX allows users to "load and store networks in standard and nonstandard data formats, generate many types of random and classic networks, analyze network structure, build network models, design new network algorithms, draw networks, and much more".

While NetworkX is one of the most popular Python packages for creating and manipulating graphs and networks, its primary goal is to facilitate graph analysis rather than perform graph visualization. However, the package does include basic drawing functionalities using Matplotlib. Hence, plotting within NetworkX would be more appropriate for simpler networks or for exploratory data analysis. For more advanced graph visualizations, network data within NetworkX may be exported and fed into fully-featured graph visualization tools such as the open source software package Graphviz.

image.png

Demo

Installation

NetworkX requires Python 3.7 or newer.

To install the latest release of the package, run pip install networkx[default].

To install the package without the dependencies (e.g., numpy, scipy), run pip install networkx.

Alternatively, manual downloads are also possible through Network's GitHub or PyPI repositories.

Simple Plotting

Graph creation is fundamentally comprised of 5 steps:

  1. NetworkX package import:     import networkx as nx
  2. Create the graph object:     g = nx.Graph()
  3. Add nodes*:      g.add_node(node)
  4. Add edges:      g.add_edge(node_1, node_2)
  5. Draw the graph:     nx.draw(g)

Note: Step 3 may be skipped as nodes will automatically be created when edges are created between non-existent nodes.

In [2]:
# import relevant libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx

# Create a networkx graph object
my_graph = nx.Graph() 
 
# Add edges to to the graph object
# Each tuple represents an edge between two nodes
my_graph.add_edges_from([
                        (1,2), 
                        (1,3), 
                        (3,4), 
                        (1,5), 
                        (3,5),
                        (4,2),
                        (2,3),
                        (3,0)])
 
# Draw the resulting graph
nx.draw(my_graph, with_labels=True, font_weight='bold')

We could also import network data from a dataframe by calling the from_pandas_edgelist function and then specifying the source dataframe and then the columns containing the linked nodes. For directional links, the source node must be specified before the target node.

The draw function also allows for graph customization such as node labeling (with_labels), node size (node_size), transparency (alpha), and edge width (linewidths).

In [3]:
# create a dataframe
df = pd.DataFrame({'from': ['A', 'B', 'C', 'A'], 
                   'to': ['D', 'A', 'E', 'C']})
# create graph object
G = nx.from_pandas_edgelist(df, 'from', 'to')
# plot the network graph
nx.draw(G, with_labels=False, node_size=500, alpha=1, linewidths=20)

We could likewise specify node positions (pos), which may come in especially handy to avoid overlapping nodes in more complex networks. There are various options available to tweak the different graph elements. To fix formatting, different matplotlib features may also be integrated in the graphs.

In [4]:
# create nodes and edges
G = nx.Graph()
G.add_edge(1, 2)
G.add_edge(1, 3)
G.add_edge(1, 5)
G.add_edge(2, 3)
G.add_edge(3, 4)
G.add_edge(4, 5)

# set positions
pos = {1: (0, 0), 2: (-1, 0.3), 3: (2, 0.17), 4: (4, 0.255), 5: (5, 0.03)}

options = {
    "font_size": 36,
    "node_size": 3000,
    "node_color": "white",
    "edgecolors": "black",
    "linewidths": 5,
    "width": 5}

nx.draw_networkx(G, pos, **options)

# Set margins for the axes so that nodes aren't clipped
ax = plt.gca()
ax.margins(0.20)
plt.axis("off")
plt.show()

To create a directed (i.e., directional) graph, we can use the DiGraph function. Again, the source node must be specified before the target node during the creation of the graph object.

In [5]:
# create graph object
G = nx.DiGraph([(0, 3), (1, 3), (2, 4), (3, 5), (3, 6), (4, 6), (5, 6)])

# group nodes by column
left_nodes = [0, 1, 2]
middle_nodes = [3, 4]
right_nodes = [5, 6]

# set the position according to column (x-coord)
pos = {n: (0, i) for i, n in enumerate(left_nodes)}
pos.update({n: (1, i + 0.5) for i, n in enumerate(middle_nodes)})
pos.update({n: (2, i + 0.5) for i, n in enumerate(right_nodes)})

nx.draw_networkx(G, pos, **options)

# Set margins for the axes so that nodes aren't clipped
ax = plt.gca()
ax.margins(0.20)
plt.axis("off")
plt.show()

Sample Visualization Exercise

After a survey of NetworkX's basic plotting functionalities, let us try to analyze and visualize a sample social network through Marvel Cinematic Universe Social Network data sourced from Tableau.

In [6]:
# import data
msn = pd.read_csv("https://github.com/mkbunyi/Data-Viz-Tutorial-NetworkX/raw/main/marvel_social_network.csv")
msn
Out[6]:
Character Name Line ID Path Ch Name Diff Character ID Character Name (copy) Main Character Name in Caps Relation Relation - Negative ... Relation Sentiment Relationship Set ID Synergy Type URL 1 Number of Records S.No. X Y
0 Abomination 1 1 NaN 1 Abomination Abomination ABOMINATION Friends NaN ... Positive NaN 1 Reciprocal NaN 1 1 1 6581.620117 2663.285156
1 Rhino 1 2 Rhino 83 Rhino Abomination RHINO Friends NaN ... Positive Relationship with Abomination - 1 Reciprocal NaN 1 1 2 7942.243652 9485.032227
2 Abomination 2 1 NaN 1 Abomination Abomination ABOMINATION Nemesis NaN ... Negative NaN 1 Perfect NaN 1 1 3 6581.620117 2663.285156
3 Hulk 2 2 Hulk 45 Hulk Abomination HULK Nemesis Nemesis ... Negative Relationship with Abomination - 1 Perfect NaN 1 1 4 4651.433594 5910.624023
4 Abomination 3 1 NaN 1 Abomination Abomination ABOMINATION Enemies NaN ... Negative NaN 1 Normal NaN 1 1 5 6581.620117 2663.285156
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1939 Ronan 970 2 Ronan 86 Ronan Yondu RONAN Enemies Enemies ... Negative Relationship with Yondu - 118 Normal NaN 1 1 1940 8907.336914 9087.875977
1940 Yondu 971 1 NaN 118 Yondu Yondu YONDU Tech Advancement NaN ... Neutral NaN 118 Normal NaN 1 1 1941 8622.555664 4088.376221
1941 Sentinel 971 2 Sentinel 88 Sentinel Yondu SENTINEL Tech Advancement NaN ... Neutral Relationship with Yondu - 118 Normal NaN 1 1 1942 6866.401855 3270.700928
1942 Yondu 972 1 NaN 118 Yondu Yondu YONDU Rivals NaN ... Negative NaN 118 Normal NaN 1 1 1943 8622.555664 4088.376221
1943 Star-Lord 972 2 Star-Lord 96 Star-Lord Yondu STAR-LORD Rivals Rivals ... Negative Relationship with Yondu - 118 Normal NaN 1 1 1944 1075.841797 1331.642578

1944 rows × 25 columns

A cursory inspection of the Marvel dataset shows that we must reformat the data from "long" format (where linked characters are listed in separate rows) to "wide" format (where one row corresponds to one link or Line ID) in order to properly feed the data into our Network graph object.

In [7]:
# reformat to combine linked characters in 1 row

# collapse rows by line ID and combine linked characters in a list per cell
msn_nx = msn.groupby('Line ID').agg(lambda x: x.tolist())
msn_nx = msn_nx[["Character Name","Character ID"]]

# split list and allocate separate columns for the linked characters
msn_nx = pd.concat([msn_nx["Character Name"].apply(pd.Series),
          msn_nx["Character ID"].apply(pd.Series)],
          axis=1).reset_index()        

# add relation type
msn_nx = msn_nx.merge(msn[["Line ID","Relation","Relation Sentiment"]],
            on="Line ID", how = "left").drop_duplicates()

# rename columns
msn_nx.columns = ['Line ID', 'Char1_Name', 'Char2_Name', 'Char1_ID', 'Char2_ID', 'Relation', 'Relation Sentiment']

# reset index
msn_nx = msn_nx.reset_index()

# view reformatted data
msn_nx
Out[7]:
index Line ID Char1_Name Char2_Name Char1_ID Char2_ID Relation Relation Sentiment
0 0 1 Abomination Rhino 1 83 Friends Positive
1 2 2 Abomination Hulk 1 45 Nemesis Negative
2 4 3 Abomination She-Hulk 1 90 Enemies Negative
3 6 4 Abomination Red Hulk 1 82 Enemies Negative
4 8 5 Abomination King Groot 1 59 Friends Positive
... ... ... ... ... ... ... ... ...
967 1934 968 Yondu Nightcrawler 118 75 It Ain't Easy Neutral
968 1936 969 Yondu Rocket Raccoon 118 84 Friends Positive
969 1938 970 Yondu Ronan 118 86 Enemies Negative
970 1940 971 Yondu Sentinel 118 88 Tech Advancement Neutral
971 1942 972 Yondu Star-Lord 118 96 Rivals Negative

972 rows × 8 columns

The dataframe is now appropriate for use into a NetworkX graph object. We just add one last variable to specify colors corresponding to the Relation Sentiment per relationship, which we can subsequently reference once we set edge colors.

In [8]:
# set color according to relation sentiment
msn_nx['color'] = np.where(msn_nx['Relation Sentiment']=="Positive",
                          "green",
                          "black")
msn_nx['color'] = np.where(msn_nx['Relation Sentiment']=="Negative",
                          "red",
                          msn_nx['color'])

We can now manipulate the data in NetworkX.

Initialization of the graph object

First, we initialize the graph object by specifying our source dataframe, nodes, and edge attributes (edge_attr).

In [9]:
# Initialize a graph object
G = nx.from_pandas_edgelist(msn_nx,
                            'Char1_Name', 
                            'Char2_Name',
                            edge_attr=["Relation","Relation Sentiment"])

We can choose from various configurations for the graph visualization, such as:'bipartite_layout', 'circular_layout', 'kamada_kawai_layout', 'random_layout', 'rescale_layout', 'shell_layout', 'spring_layout', 'spectral_layout', 'fruchterman_reingold_layout'. For this example, we shall use 'kamada_kawai_layout', which positions nodes using a path-length cost function. I settled with this layout due to its minimal node overlap compared to other configurations. NetworkX documentation describes the methods of the different node positioning algorithms for graph drawing.

In [10]:
# Generate layout for visualization
pos = nx.kamada_kawai_layout(G)
Tweaking node positions/coordinates

Based on our chosen layout's algorithm, node positions will be generated automatically. However, we can also perform manual tweaking to address node overlaps in the visualization, among others. To highlight the effect, we will nudge Captain America to the bottom of the graph.

It is also possible to dispense with NetworkX's built-in layouts and specify user-determined coordinates for all nodes.

In [11]:
# Manual position tweaking
pos["Captain America"] += (0, -1)

Tweaking node size

We can also customize node size. In this example, we will set node size proportional to the number of links that the Marvel character has.

In [12]:
# node size is proportional to number of links
links=dict.fromkeys(G.nodes(),0.0)
for (node1,node2,attrib) in G.edges(data=True):
    links[node1]+=1
    links[node2]+=1

Aside: As mentioned, the graph object also stores additional information on the nodes and edges, which may be called in outside functions (in this case, to generate the number of links an entity possesses). To help imagine the data structures of theses nodes and edges, here is a sample view of our current graph object's nodes.

In [13]:
G.nodes()
Out[13]:
NodeView(('Abomination', 'Rhino', 'Hulk', 'She-Hulk', 'Red Hulk', 'King Groot', 'M.O.D.O.K.', 'Mordo', 'Agent Venom', 'Carnage', 'Drax', 'Gamora', 'Groot', 'Spider-Man\xa0(Classic)', 'Spider-Man (Symbiote)', 'Venom', 'Venompool', 'Wolverine (X-23)', 'Angela', 'Hela', 'Loki', 'Morningstar', 'Rocket Raccoon', 'Star-Lord', 'Thor', 'Ant-Man', 'Black Panther (Civil War)', 'Doctor Octopus', 'Falcon', 'Iron Man', 'Scarlet Witch', 'Sentinel', 'Yellowjacket', 'Archangel', 'Beast', 'Black Widow', 'Colossus', 'Ghost Rider', 'Iceman', 'Phoenix', 'Psylocke', 'Black Panther', 'Gambit', 'Iron Patriot', 'Karnak', 'Nightcrawler', 'Superior Iron Man', 'Yondu', 'Black Bolt', 'Ms. Marvel (Kamala Khan)', 'Ronan', 'Cyclops (New Xavier School)', 'Kang', 'Medusa', 'Doctor Strange', 'Quake', 'Magneto (Marvel Now!)', 'Iron Fist', 'Iron Fist (Immortal)', 'Storm', 'Deadpool', 'Guillotine', 'War Machine', 'Vision (Age of Ultron)', 'Winter Soldier', 'Killmonger', 'Hawkeye', 'Captain Marvel', 'Crossbones', 'Daredevil (Classic)', 'Elektra', 'Hulk (Ragnarok)', 'Hulkbuster', 'Ms. Marvel', 'Sentry', 'Thor (Jane Foster)', 'Ultron', 'Void', 'Blade', 'Dormammu', 'Mephisto', 'Spider-Man (Stark Enhanced)', 'Cable', 'Deadpool (X-Force)', 'Cyclops\xa0(Blue Team)', 'Rogue', 'Captain America', 'Punisher 2099', 'Wolverine', 'Captain America (WWII)', 'Civil Warrior', 'Juggernaut', 'Magik', 'Old Man Logan', 'Taskmaster', 'Magneto', 'Unstoppable Colossus', 'Daredevil', 'Kingpin', 'Luke Cage', 'Punisher', 'Moon Knight', 'Spider-Gwen', 'Gwenpool', 'Thanos', 'Electro', 'Vulture', 'Doctor Voodoo', 'Hyperion', 'The Hood', 'Nebula', 'Green Goblin', 'Spider-Man (Miles Morales)', 'Howard the Duck', 'Thor (Ragnarok)', 'Joe Fixit', 'Vision', 'Ultron (Classic)'))

Visualization proper

We are now ready to fix the plot elements (through matplotlib) and visualize graph components.

In [15]:
fig, ax = plt.subplots(figsize=(40, 40))

# draw edges
nx.draw_networkx_edges(G, pos, alpha=1, width=5,
                       edge_color=[msn_nx["color"][i] for i in list(range(len(msn_nx)))])

# draw nodes
nx.draw_networkx_nodes(G, pos,
                       node_size = [links[i]*500 for i in G],
                       node_color="blue", alpha=1,
                       label=[msn_nx["Char1_Name"][i] for i in list(range(len(msn_nx)))])

# draw labels
label_options = {"ec": "black", "fc": "white", "alpha": .9}
nx.draw_networkx_labels(G, pos, font_size=30, bbox=label_options)

# display title
font = {"color": "black", "fontweight": "bold", "fontsize": 40}
ax.set_title("Marvel Social Network", font)

# Resize figure for label readibility
ax.margins(0.1, 0.05)
fig.tight_layout()
plt.axis("off")
plt.show()

As mentioned, NetworkX may not be suitable for production-quality visualizations, especially for complex networks. However, we can get a few insights from the above graph on the nature of relationships existing in our sample social network (which looks like a good mix of positive and negative, although leaning towards more positive, as well as rarely neutral connections). We also see larger nodes for characters with more extensive connections, such as Captain America, Black Widow, and Hulk. Subsetting for well-connected or less-connected characters may also be done to generate more easy-to-understand visualizations. For instance, we can use NetworkX's graph analysis techniques to check which characters possesses the most links and then generate a graph visualization for that character:

Subsetting and NetworkX's graph analysis functions

To start, we can view sample relationships for our dataset's first character, Abomination:

In [16]:
# view Abomination's links and the nature of these links
G['Abomination']
Out[16]:
AtlasView({'Rhino': {'Relation': 'Friends', 'Relation Sentiment': 'Positive'}, 'Hulk': {'Relation': 'Enemies', 'Relation Sentiment': 'Negative'}, 'She-Hulk': {'Relation': 'Enemies', 'Relation Sentiment': 'Negative'}, 'Red Hulk': {'Relation': 'Enemies', 'Relation Sentiment': 'Negative'}, 'King Groot': {'Relation': 'Friends', 'Relation Sentiment': 'Positive'}, 'M.O.D.O.K.': {'Relation': 'Cubical Mates', 'Relation Sentiment': 'Neutral'}, 'Mordo': {'Relation': 'Enemies', 'Relation Sentiment': 'Negative'}})
In [17]:
# count the number of connections
len(G['Abomination'])
Out[17]:
7

Using these graph analysis functions, we can check which character has the most connections and then zoom in our analysis on that character.

In [18]:
# initialize
top_links = {}

# iterate through nodes to count connections
for char in G.nodes:
    top_links[char] = len(G[char])
    
# convert to dataframe
s = pd.Series(top_links, name='connections')
df = s.to_frame().sort_values('connections', ascending=False)
df
Out[18]:
connections
Black Widow 20
Hulk 20
Wolverine 19
Spider-Man (Classic) 18
Iron Man 18
... ...
King Groot 4
Blade 4
Killmonger 4
Punisher 2099 3
Psylocke 3

118 rows × 1 columns

We can then subset our data and visualize the connections of the character atop our list -- Black Widow.

In [19]:
msn_blackwidow = msn_nx[msn_nx["Char1_Name"]=="Black Widow"].reset_index()
msn_blackwidow
Out[19]:
level_0 index Line ID Char1_Name Char2_Name Char1_ID Char2_ID Relation Relation Sentiment color
0 80 160 81 Black Widow Archangel 10 5 Teammates Positive green
1 81 162 82 Black Widow Black Panther (Civil War) 10 9 Friends Positive green
2 82 164 83 Black Widow Captain Marvel 10 15 Friends Positive green
3 83 166 84 Black Widow Crossbones 10 19 Rivals Negative red
4 84 168 85 Black Widow Daredevil (Classic) 10 23 Romance Positive green
5 85 170 86 Black Widow Elektra 10 32 Rivals Negative red
6 86 172 87 Black Widow Falcon 10 33 Enemies Negative red
7 87 174 88 Black Widow Hawkeye 10 41 Romance Positive green
8 88 176 89 Black Widow Hulk 10 45 Avengers Positive green
9 89 178 90 Black Widow Hulk (Ragnarok) 10 46 Lullaby Positive green
10 90 180 91 Black Widow Hulkbuster 10 47 Avengers Positive green
11 91 182 92 Black Widow Iceman 10 49 Teammates Positive green
12 92 184 93 Black Widow Ms. Marvel 10 72 Friends Positive green
13 93 186 94 Black Widow Quake 10 81 S.H.I.E.L.D Clearance Neutral black
14 94 188 95 Black Widow Sentry 10 89 Friends Positive green
15 95 190 96 Black Widow Thor (Jane Foster) 10 102 Friends Positive green
16 96 192 97 Black Widow Ultron 10 104 Enemies Negative red
17 97 194 98 Black Widow Void 10 111 Overcoming Fear Neutral black
18 98 196 99 Black Widow War Machine 10 113 Teammates Positive green
19 99 198 100 Black Widow Winter Soldier 10 114 Romance Positive green

We can now visualize Black Widow's network like before. However, this time we will try to use the spring layout which will put Black Widow, our main entity of interest, in the middle of the graph.

In [20]:
# initialize plot
fig, ax = plt.subplots(figsize=(10, 8))

# Initialize a graph object
G = nx.from_pandas_edgelist(msn_blackwidow,
                            'Char1_Name', 
                            'Char2_Name',
                            edge_attr=["Relation","Relation Sentiment"])

# Draw using a spring layout
nx.draw_spring(G,with_labels=True,
               edge_color=[msn_blackwidow["color"][i] for i in list(range(len(msn_blackwidow)))],
               node_color = "gainsboro",
              node_size = 2000,
              font_size=11)

# Resize figure for label readibility
fig.tight_layout()
plt.axis("off")
plt.margins(x=0.4)
plt.show()

This simple demonstration barely scratches the surface of the possibilities within NetworkX and the use cases for network analysis. One possible application is determining connected components, where we would like to identify distinct groups within our dataset.

Use case: Connected components

Adapted from Rahul Agarwal, Towards Data Science (August 2019)

Real-world scenarios where the connected components algorithm could be potentially useful are:

  • Edges may represent a variety of relationships. For instance, we could also set links (edges) between individuals having the same mobile number or address and then cluster the group according to their feature similarities.
  • In financial crime investigation, law enforcement authorities may quickly uncover webs of accounts or channels used by criminal groups.
  • Finally, network analysis could help in identifying clusters between geographical locations connected by transportation, which we shall demonstrate in the following exercise.
In [21]:
# load in data
cities = pd.read_csv("https://github.com/mkbunyi/Data-Viz-Tutorial-NetworkX/raw/main/distances.csv")
cities.head()
Out[21]:
node1 node2 distance
0 Mannheim Frankfurt 85
1 Mannheim Karlsruhe 80
2 Erfurt Wurzburg 186
3 Munchen Numberg 167
4 Munchen Augsburg 84
In [22]:
# create graph object along with nodes and edges
g = nx.Graph()
for edge in range(len(cities)):
    g.add_edge(cities["node1"][edge],
               cities["node2"][edge], 
               weight = cities["distance"][edge])

NetworkX has a rich repertoire of graph analysis tools that don't require visualization. We could easily identify distinct groups of connected components by runnning our graph object against the connected_components function:

In [23]:
for i, x in enumerate(nx.connected_components(g)):
    print("cc"+str(i)+":",x)
cc0: {'Erfurt', 'Wurzburg', 'Mannheim', 'Frankfurt', 'Karlsruhe', 'Stuttgart', 'Augsburg', 'Numberg', 'Munchen', 'Kassel'}
cc1: {'Kolkata', 'Bangalore', 'Delhi', 'Mumbai'}
cc2: {'ALB', 'NY', 'TX'}

We could also plot the graph and eyeball the results. This time, we will use the 'spring layout', where we can set k as the optimal distance between nodes. The higher the value for k, the larger the distance between nodes.

In [24]:
# set layout
pos = nx.spring_layout(g, k=5, seed=10)

# plot the network
nx.draw(g,pos,
        with_labels = True,  #labels nodes
        node_color='lightsteelblue',
        edge_color='slategrey')

# label edges (distance between cities)
edge_labels = nx.get_edge_attributes(g,'weight')
nx.draw_networkx_edge_labels(g,pos,edge_labels=edge_labels,
                            font_size=9,rotate=False)

plt.show()

Other Use Cases

(Adapted from Rahul Agarwal, Towards Data Science (August 2019) and the developer guide of open source graph database Neo4j)

1. Shortest path

Network analysis is also helpful in determining the shortest distance between two points. For instance, continuing the city distances example, we can compute for the shortest distance from Frankfurt to Stuttgart and the path that we need to traverse to cover this distance. Similar algorithms are used in tools such as Google Maps, grocery shopping, and even computation of LinkedIn connections.

In [25]:
print(nx.shortest_path_length(g, 'Frankfurt','Stuttgart',weight='weight'))
print(nx.shortest_path(g, 'Frankfurt','Stuttgart',weight='weight'))
503
['Frankfurt', 'Wurzburg', 'Numberg', 'Stuttgart']

2. Pagerank

This algorithm measures node importance based on the number and quality of its (incoming and outgoing) links. Its use cases include:

  • Ranking of websites, tweets, or Facebook users
  • Determining most influential papers based on citations
  • Determining the central actors in organized criminal networks

Below is a sample graph showing betweenness centrality within a Facebook user network. The Pagerank algorithm will give a higher score to a user with extensive friend lists who also have extensive friend lists. The most influential user in this network is marked by a yellow dot.

image.png

3. Centrality measures

NetworkX covers various centrality measures, some of which are as follows:

  • Degree Centrality. Computed as the fraction of total nodes a specific node is connected to, this is a crude popularity measure which tends to emphasize the number of connections versus quality of engagement in social networks. Weighted degree causality has also been used to detect online auction fraud, where fraudsters "colluded with each other to artificially increase prices" and so were found to possess higher centrality compared to legitimate buyers.

Below is a sample graph showing betweenness centrality within a Facebook user network. Users with larger nodes represent users with higher influence who serve as information passers between different groups.

image.png